Privacy campaign group targets OpenAI in complaint about ChatGPT’s ‘hallucinations’
Posted: May 10, 2024
The campaign group noyb, headed by privacy campaigner Max Schrems, has accused OpenAI of violating the GDPR’s rules on data accuracy and individual rights in a complaint to the Austrian Data Protection Authority (DPA).
The complaint involves some fundamental aspects of Large Language Models (LLMs): Their tendency to “hallucinate” false personal data as outputs and their opaque systems and processes.
Here’s a look at why noyb thinks ChatGPT might be incompatible with the GDPR.
Background: The date-of-birth data error
This complaint began when the complainant, represented by noyb, asked ChatGPT his date of birth. The chatbot replied inaccurately, and the complainant submitted the following requests to OpenAI, the data controller responsible for ChatGPT:
- To access his personal data
- To access information about the sources of his personal data, how long it will be stored, OpenAI’s legal basis for processing it, and any third parties with which it will be shared
- To erase his “incorrect date of birth from the results displayed by ChatGPT”
Noyb’s complaint to the Austrian DPA has been redacted, so we don’t know who the ChatGPT user in question was – but we know he’s male, lives in Austria, and is a “public figure”.
The complaint also references attachments, such as the data subject requests submitted to OpenAI, which are not publicly available.
Some elements of the case, such as the complainant’s request that OpenAI “erase his incorrect date of birth from the results displayed by ChatGPT” are somewhat unclear but might be explained by the attachments submitted to the Austrian DPA.
OpenAI’s response
OpenAI reportedly responded to the complainant’s 4 December 2023 request on 7 February 2024 – over a month late according to the GDPR’s standard timescales. However, the complaint does not say whether this was OpenAI’s first response or whether OpenAI gave itself an extension (as is permitted “where necessary”).
Noyb says that OpenAI responded to the requests by:
- Providing a copy of the complainant’s account details
- Failing to provide the requested information about the source (etc.) of the complainant’s personal data
- By noting that the company uses filters to block certain personal data from appearing in its outputs on request
- By explaining that it could not prevent ChatGPT from producing the complainant’s incorrect date of birth “without affecting other pieces of information” that the chatbot would display about him
ChatGPT’s filters were introduced in response to an ongoing investigation by the Italian DPA. The issue here appears to be that they provide for an all-or-nothing approach and cannot block the specific incorrect output identified by noyb’s complainant.
Noyb’s complaint
Noyb’s complaint to the Austrian DPA alleges violations of three parts of the GDPR:
- Article 5(1)(d): The principle of accuracy, which states that personal data must be “accurate and, where necessary, kept up to date…”
- Article 12(3): This provision relates to the timing and “modalities” required when responding to data subject rights requests, suggesting that OpenAI might have responded late or in a manner that does not meet the GDPR’s requirements.
- Article 15: The right of access, including the right to obtain information about the processing of personal data and a copy of the data itself.
Despite the complainant’s apparent dissatisfaction with how OpenAI handled his request under the “right to erasure”, the complaint does not cite a violation of Article 17 of the GDPR, which sets out the relevant requirements in relation to this right.
The complainant also requests that the Austrian DPA handle his complaint directly, rather than forwarding it to Ireland, where OpenAI has a European establishment.
Under the GDPR’s “one-stop-shop” process, a controller’s “lead supervisory authority” (in this case, the Irish Data Protection Commission) should normally handle complaints relevant to people in multiple EU countries.
Noyb argues that the one-stop-shop process does not apply in this case, as the decisions concerning ChatGPT are allegedly made at OpenAI’s headquarters in California, rather than in Ireland.
A misunderstanding of AI?
Noyb’s complaint seeks to fundamentally undermine how ChatGPT and other LLMs work. By producing text based on the statistical likelihood that one set of characters will appear next to another, some inaccurate outputs seem to be inevitable.
Furthermore, OpenAI treats its training data as commercially sensitive. This appears to conflict with the GDPR’s requirement that controllers tell data subjects about where they obtained their personal data.
And training an LLM is expensive and resource-intensive. Training GPT-4, OpenAI’s latest ChatGPT model, reportedly cost around $100 million, making retraining the model to facilitate requests to correct or delete personal data prohibitive.
As such, the outcome of noyb’s complaint—or perhaps one of the other investigations into ChatGPT occurring in Poland, Germany, and other EU member states – could have serious consequences for the future of generative AI.
Read our Privacy professionals’ AI checklist
Though AI technology and legislation are rapidly evolving, there is enough of a trending pattern for savvy businesses to get ahead of the AI train. To help an organization make privacy-sensitive and future-proofed AI decisions use our AI top 10 checklist to support:
- Identifying data goals, strategy, and tactics
- Determine legal basis
- Solve transborder data flow concerns
- Consider data sets.